Corpus-oriented Acquisition of Chinese Grammar
نویسندگان
چکیده
The acquisition of grammar from a corpus is a challenging task in the preparation of a knowledge bank. In this paper, we discuss the extraction of Chinese grammar oriented to a restricted corpus. First, probabilistic context-free grammars (PCFG) are extracted automatically from the Penn Chinese Treebank and are regarded as the baseline rules. Then a corpusoriented grammar is developed by adding specific information including head information from the restricted corpus. Then, we describe the peculiarities and ambiguities, particularly between the phrases “PP” and “VP” in the extracted grammar. Finally, the parsing results of the utterances are used to evaluate the extracted grammar.
منابع مشابه
Knowledge Acquisition And Chinese Parsing Based On Corpus
In Natural Language Processing (NLP), one key problem is how to design a robust and effective parsing system. In this paper, we will introduce a co rpmbased Chinese parsing system. Our efforts are coucetrated on: (1) knowledge acquisition and representation; and (2) the parsing scheme. The knowledge of this system is principally extracted from analyzed corpus, others are a few grammatical princ...
متن کاملIncorporating Cognitive Linguistic Insights into Classrooms: the Case of Iranian Learners’ Acquisition of If-Clauses
Cognitive linguistics gives the most inclusive, consistent description of how language is organized, used and learned to date. Cognitive linguistics contains a great number of concepts that are useful to second language learners. If-clauses in English, on the other hand, remain intriguing for foreign language learners to struggle with, due to their intrinsic intricacies. EFL grammar books are ...
متن کاملThe L2 Acquisition of the Chinese Aspect Marking
By analyzing corpus data, we have shown that the tendencies of restricting perfective past marking to Accomplishments and Achievements and imperfective marking to Statives and Activities as described by the Aspect Hypothesis (Shirai, 1991; Andersen & Shirai, 1996), undesirable in the acquisition of various languages, are desirable in the acquisition of a language like Chinese, because these ten...
متن کاملTreebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation
This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...
متن کاملChinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank
Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (CCG) derivations, induced automatically from the Penn Chines...
متن کامل